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Abstract 

Codon usage bias (CUB) in Drosophila is higher for X-linked genes than for autosomal genes. One possible explanation is 
that the higher effective recombination rate for genes on the X chromosome compared with the autosomes reduces their 
susceptibility to Hill-Robertson effects, and thus enhances the efficacy of selection on codon usage. The genome sequence 
of D. melanogaster was used to test this hypothesis. Contrary to expectation, it was found that, after correcting for the 
effective recombination rate, CUB remained higher on the X than on the autosomes. In contrast, an analysis of poly- 
morphism data from a Rwandan population showed that mean nucleotide site diversity at 4-fold degenerate sites for 
genes on the X is approximately three-quarters of the autosomal value after correcting for the effective recombination 
rate, compared with approximate equality before correction. In addition, these data show that selection for preferred 
versus unpreferred synonymous variants is stronger on the X than the autosomes, which accounts for the higher CUB of 
genes on the X chromosome. This difference in the strength of selection does not appear to reflect the effects of 
dominance of mutations affecting codon usage, differences in gene expression levels between X and autosomes, or 
differences in mutational bias. Its cause therefore remains unexplained. The stronger selection on CUB on the X chromo- 
some leads to a lower rate of synonymous site divergence compared with the autosomes; this will cause a stronger 
upward bias for X than A in estimates of the proportion of nonsynonymous mutations fixed by positive selection, for 
methods based on the McDonald- Kreitman test. 

Key words: Drosophila melanogaster, codon usage, effective population size, recombination, Hill-Robertson interference, 
gene expression. 



Introduction 

The genetic code is degenerate, such that most amino acids 
are encoded by more than one synonymous codon. In a wide 
variety of organisms, the frequencies with which such syn- 
onymous codons occur are nonrandom, that is, there is 
codon usage bias (CUB). In organisms such as Drosophila, 
many bacteria and yeast, there is much evidence that CUB 
is at least in part a result of natural selection, acting either on 
translational accuracy or on translational efficiency (McVean 
and Charlesworth 1999, see figure 4). A striking observation 
on several Drosophila species is that CUB is higher on the X 
chromosome than on the autosomes (Singh et al. 2005a, 
2005b, 2008), and neo-X chromosomes seem to be evolving 
higher levels of CUB than their autosomal ancestors (Singh 
et al. 2008; Vicoso et al. 2008). 

There are several possible reasons for the higher CUB for 
genes on the X chromosome. Stronger selection on X-linked 
loci when the disfavored allele is recessive or partially reces- 
sive could potentially cause such an effect (McVean 
and Charlesworth 1999; Singh et al. 2005a; Vicoso and 
Charlesworth 2009a). Higher CUB of X-linked genes could 



be favored if dosage compensation is incomplete, by com- 
pensating for lower levels of X chromosome gene expression 
in males (Singh et al. 2005a). Finally, higher levels of gene 
expression in females for genes on the X chromosome 
(Gupta et al. 2006; Sturgill et al. 2007) could lead to higher 
CUB on the X, because high levels of gene expression appear 
to be associated with stronger selection for CUB (Duret and 
Mouchiroud 1999; Drummond and Wilke 2009; Zeng and 
Charlesworth 2009), and X-linked genes spend two-thirds 
of their time in females, and only one-third of their time 
in males. 

Another possible explanation is the difference in effective 
recombination rates between X-linked and autosomal genes, 
and the implications of this difference for the effectiveness of 
selection. The recombination rate is known to affect the effi- 
cacy of selection, due to Hill-Robertson interference (HRI) 
among linked loci under selection. Consistent with this, CUB 
in Drosophila is reduced in genomic regions with little or no 
recombination (Kliman and Hey 1993; Haddrill et al. 2007; 
Campos et al. 2012). The rate of recombination on 
the X chromosome and autosomes differs between males 
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and females in Drosophila, because males lack meiotic 
crossing over and gene conversion (Ashburner et al. 2005). 
The appropriate sex-averaged recombination rate for the 
X that is relevant to population genetic processes is thus 
two-thirds of the female recombination rate, as opposed 
to one-half for autosomal genes (Langley et al. 1988); 
such averaging provides estimates of the "effective" recom- 
bination rates (Charlesworth 2012b). This means that 
X-linked genes will be less subject than autosomal genes 
to the effects of HRI from selection at linked sites (Vicoso 
and Charlesworth 2009a; Charlesworth 2012b), which could 
contribute to the higher CUB for the X chromosome (Singh 
et al. 2005a, 2008). 

The aim of this study is to use the genome sequence of 
D. melanogaster to determine the influence of the difference 
in effective recombination rate between X and autosomes 
on CUB, taking into account possible confounding effects 
of several factors known to influence CUB, such as the level 
of gene expression, protein length, GC content and diver- 
gence (Duret and Mouchiroud 1999; Singh et al. 2005b; 
Zeng and Charlesworth 2009). This was done by examining 
whether the difference in CUB between the X chromosome 
(X) and autosomes (A) is removed if we compare X-linked 
and autosomal genes with similar effective recombination 
rates. In addition, to assess whether there is a difference 
in the effective population sizes between X-linked and 
autosomal genes with comparable effective recombination 
rates (cf. Vicoso and Charlesworth 2009a), we used whole 
genome resequencing data from a Rwandan population 
(http://www.dpgp.org, last accessed January 7, 2013) to 
compare diversity levels and the strength of selection on 
variants affecting codon usage at autosomal and X-linked 
loci. 

Materials and Methods 

Coding Sequences 

Coding regions of the D. melanogaster genome (Release 5.34) 
were obtained from FlyBase (www.flybase.org, last accessed 
January 7, 2013). We excluded genes located within the 
heterochromatic nonrecombining regions and euchromatic 
genes with very low recombination rates (<0.05 cM/Mb) 
(Charlesworth 1996; Smith et al. 2007). 

Recombination Rate Estimates 

We divided each chromosome into 200 kb bins and calcu- 
lated the recombination rate in each bin using the 
D. melanogaster recombination rate calculator available 
from http://petrov.stanford.edu/cgi-bin/recombination-rates_ 
updateR5.pl (last accessed January 7, 2013) (Fiston-Lavier 
et al. 2010). We used the mid-coordinate of each gene 
to assign it to a recombination bin. Sex-averaged recom- 
bination rates were obtained by multiplying the recombin- 
ation estimates for genes located on autosomal regions 
by one-half and those on the X by two-thirds (see 
Introduction). To analyze genes on the X and autosomes 
with similar effective recombination rates, an "overlap re- 
gion" within the range 1-2.1 cM/Mb was defined (oX, X 



chromosome overlap region; oA, autosomal overlap region), 
which contains only those genes for which the effective 
recombination rates are similar. We also subdivided the 
overlap region into three groups with respect to their re- 
combination rates: low (1 to <1.40 cM/Mb), intermediate 
(1.4 to <1.75 cM/Mb), and high (1.75 to < 2.1 cM/Mb). 
Analyses were also conducted on the "full" range of effective 
recombination rates, over the range 0.05-2.75 cM/Mb. 

We also used an alternative measure of recombination in 
the middle of each chromosome. This measure assumes that 
map distance is approximately linearly related to physical 
position in the middle of each of the D. melanogaster arm 
chromosomes (Charlesworth 1996), avoiding the need to fit a 
polynomial equation to the data in this region (supplemen- 
tary material 1, Supplementary Material online). The results 
of analyses using this measure were very similar to those 
presented later. 

Variables Analyzed 

Estimates of the level of CUB from the frequency of optimal 
codons, Fop, were calculated using Codonw (Peden 1999). 
The GC content of genes was estimated for the third pos- 
itions of codons (CC 3 ) and for the short introns (<80 bp; see 
Halligan and Keightley 2006) of the selected isoform (CC ( ), 
following removal of 8 bp/30 bp at the beginning/end of the 
introns, and masking of possible exonic sequences to exclude 
any sites that may be subject to selective constraints within 
the selected introns. Gene lengths were measured by the 
lengths of the coding sequence (CDS). We used D. yakuba 
as an outgroup to estimate the ratio of 0-fold divergence to 
4-fold divergence {KJK A ) using the Kimura two-parameter 
correction (Kimura 1980), because it has enough divergence 
from D. melanogaster to avoid any major effects of ancestral 
polymorphisms, and its genome is well annotated with high 
coverage (9.1X) (Clark et al. 2007). Details of the criteria used 
to obtain orthologous coding sequences are described by 
Campos et al. (2012). 

Diversity Estimates 

To estimate nucleotide site diversities (jt), we used sequence 
data on a population of D. melanogaster from Gikongoro 
(RG) in Rwanda, available from the Drosophila Population 
Genomics Project (DPGP: http://www.dpgp.org/, last accessed 
January 7, 2013). We chose genomes and individuals with a 
sequencing depth coverage of 25X (the RG primary core), 
from a total of 22 lines. We selected a minimum quality 
value of 31 and masked any regions below that threshold. 
Moreover, we masked regions showing evidence of putative 
cosmopolitan admixture (recent gene flow from outside 
Africa), as identified by an identity by descent analysis carried 
out by the DPGP. Any ambiguous nucleotides were masked as 
well. We used the script dpgp_fastq_2_fasta.pl (provided by 
the DPGP) to convert and mask the FastQ files into fasta files. 
Because of masked sites, 22 alleles were not always available 
for each site, so we calculated composite estimates of it at 
0-fold (jt 0 ) and 4-fold (jt 4 ) sites. For a given site, it was esti- 
mated as the product of k/(k — 1) and 1 — _J3; 2 ' where p, is 
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the frequency of the ith variant at the site, and k is the 
number of alleles sequenced (Nei 1987, p. 256). We calculated 
it for all sites with the same k, and provide a weighted average 
of jt according to the number of sites in each k category. We 
rejected any sites where we had fewer than 15 unmasked 
alleles. 

Gene Expression Data 

As described by Campos et al. (2012), we used RNAseq gene 
expression available for D. melanogaster in FlyBase (2012). For 
each D. melanogaster gene, we analyzed the levels of gene 
expression in adults for females and males separately, as the 
average expression of the three adult stages available (1, 5, and 
30 days). We analyzed gene expression as log 2 (RPKM+ 1), 
where RPKM is reads per kilobase of exon per million mapped 
reads. We also calculated an overall level of gene expression 
for each gene across all the developmental stages of the data 
set; for autosomal genes, we used the average of the two sexes, 
whereas for X chromosomal genes we used a weighted aver- 
age of 2/3 for females and 1/3 for males, reflecting the mean 
time that an X chromosome spends in each sex. 

Final Data Set 

The final data set included only genes with expression data 
(RPKM > 0), a K 4 >0 and <0.50, amino acid length > 29, 
percentage of amino acid sequence identity more than 
50%, less than 50% gaps, and the presence of a single ortho- 
logous gene in D. yakuba. The number of genes analyzed in 
this study were 6,604 (569X, 6035A) for the overlap region 
and 9,224 (1545X, 7679A) for the full set. 

Statistical Analyses 

We used the Mann-Whitney U test (two-tailed) to compare 
data sets. We controlled for the false discovery rate (FDR) by 
the method of Benjamini and Hochberg (1995), implemented 
in the package multtest (Pollard et al. 2005), with a FDR 
threshold of 0.05. For each data set and variable, we calculated 
the mean and estimated a confidence interval (CI) by boot- 
strapping across genes. We performed paired one-sided 
Wilcoxon tests to examine whether the mean level of gene 
expression in females is higher than that in males. 

We calculated partial correlations between Fop and 
recombination rate, CDS length, gene expression and CC,, 
whereas controlling for their covariates (Kg, K 4 , effective 
recombination rate, overall gene expression, CC| and CDS 
length), using the R function "pcor.test" (a variance- 
covariance matrix method) available at http://www.yi lab. 
gatech.edu/pcor.R (last accessed January 7, 2013) (Kim and 
Yi 2006); we report Spearman's non parametric correlation 
coefficients, with 95% CIs obtained by bootstrapping across 
genes. 

Estimating Selection on CUB, and Mutational and 
Demographic Parameters 

An extension of the method of Zeng and Charlesworth (2009, 
2010a) was used to test for differences in the intensity of 
selection on codon bias and the effective population size 



between autosomal and X-linked genes in the overlap 
region. This model infers the parameters from DNA sequence 
polymorphism data, and takes account of the potential 
effects of recent population size changes by allowing a 
one-step change in population size. Let N e be the effective 
population size of the autosomes before this change. The 
scaled mutation rates away from and towards the unpre- 
ferred codons are 0 = 4N e u and k9, respectively, where u is 
the "raw" mutation rate from unpreferred to preferred 
codons. The ratio of the effective population size of the 
X chromosome to that of the autosomes is denoted by X, 
so that the effective size of the X chromosome is lN e . On 
the assumption of semidominance, selection on CUB can be 
characterized by y x = 4AN,s x and y A - 4N e s A , where s x and 
s A are the selection coefficients for heterozygotes for the 
X and autosomes, respectively. The population is assumed 
to be at statistical equilibrium until t generations ago, at 
which point its size changes g-fold instantly, such that 
the effective population sizes become gN e for the autosomes 
and glN e for the X chromosome, respectively. Following pre- 
vious usage, we define the scaled time as r = t/(2gN e ). 

The full model, denoted by L v thus has seven param- 
eters — 9, k, y x , Ya> h g and r. When g- 1 and/or r = oo, 

reduces to a model with constant population size, denoted 
by 1_ 0 . The log-likelihood of the data under 1_ 0 and can 
be calculated using equations (1) and (2) of Haddrill et al. 
(2011). Maximum likelihood (ML) estimates of the param- 
eters were searched for by using multidimensional optimiza- 
tion algorithms without derivatives (see Press et al. 1992, 
section 10.4; Lau 2003, section 5.2.4). Multiple random start- 
ing points were used to initialize the algorithms, and the 
algorithms were iterated until they converged. 

Results 

Codon Usage and GC Content of Genes on X and A 
For genes over the full range of recombination rates, the 
mean effective recombination rate (Rec) for X genes was 
higher than for A genes (Rec x = 2.08 cM/Mb vs. Rec A = 1.39 
cM/Mb; P< 10~ 16 ; table 1). Consistent with the results of 
previous studies (Singh et al. 2005a; Gupta et al. 2006; 
Sturgill et al. 2007; Zhang and Oliver 2007), X chromosome 
genes, in both the full data set and the overlap region, had 
significantly higher levels of Fop, CC content, gene expression 
in females and CDS length than autosomal genes (table 1). 
The mean X/A ratio for Fop was 1.06 (CI = 1.05-1.07) and 1.08 
(CI = 1.06-1.09), for the whole and overlap regions, respect- 
ively, despite the longer average coding sequence length of 
genes on the X chromosome, and the well-known negative 
association between gene length and Fop (Duret and 
Mouchiroud 1999). The level of gene expression (exp.) in 
males was similar for X and A in the full data set (X male 
exp. = 9.45, A male exp. = 9.50, P = 0.204; table 1), but margin- 
ally significantly higher for A than X in the overlap region 
(X male exp. = 9.32, A male exp. = 9.48, P = 0.034; table 1). 

In each of the overlap regions considered separately, the 
mean effective recombination rate was similar for the X and 
A genes (Rec = 1.61, P- 0.6; table 1), with a fairly narrow range 
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Table 1. Variables Analyzed for the Full and Overlap Region Data Sets. 




X 


A 


P 


N 


1,545 


7,679 




Rec 


2.08 (2.05-2.11) 


1.39 (1.37-1.40) 


<W~ U 


Fop 


0.551 (0.546-0.555) 


0.518 (0.516-0.520) 


<W~ 16 


CC 3 


0.688 (0.683-0.692) 


0.641 (0.639-0.643) 


<W~' 6 


GC, 


0.393 (0.387-0.400) 


0.352 (0.349-0.355) 


<io-' 6 


7T() 


0.00130 (0.00122-0.00137) 


0.00162 (0.00157-0.00166) 


3 x lO"' 0 




0.0152 (0.0147-0.0157) 


0.0159 (0.0156-0.0162) 


0.675 


7i 4 corrected 


0.0203 (0.00196-0.0021) 


0.0159 (0.0156-0.0162) 


<W~' 6 


Ko 


0.040 (0.037-0.042) 


0.038 (0.037-0.039) 


0.069 


K 4 


0.240 (0.236-0.244) 


0.248 (0.246-0.250) 


6 x 10" 5 


Overall exp. 


9.90 (9.80-10.0) 


9.78 (9.73-9.83) 


0.206 


Female exp. 


9.09 (8.90-9.27) 


8.30 (8.21-8.39) 


2 X 10~' 3 


Male exp. 


9.45 (9.33-9.58) 


9.50 (9.44-9.56) 


0.204 


CDS length 


538 (514-563) 


493 (484-502) 


7 x 10' 4 




oX 


oA 


P 


N 


569 


6,035 




Rec. 


1.61 (1.58-1.63) 


1.61 (1.60-1.62) 


0.606 


Fop 


0.558 (0.551-0.566) 


0.519 (0.516-0.521) 


<io-' 6 


CC 3 


0.698 (0.690-0.705) 


0.642 (0.640-0.644) 


<io-' 6 


GC, 


0.418 (0.408-0.430) 


0.351 (0.348-0.354) 


<io-' 6 


Kq 


0.00123 (0.0011-0.00136) 


0.00177 (0.00172-0.00182) 


<W~ 16 


7I 4 


0.0129 (0.0121-0.0135) 


0.0181 (0.0178-0.0184) 


<io-' 6 


7t 4 corrected 


0.0171 (0.0163-0.0180) 


0.0181 (0.0178-0.0184) 


0.061 


K 0 


0.041 (0.037-0.044) 


0.038 (0.037-0.039) 


0.034 


K A 


0.238 (0.231-0.244) 


0.248 (0.246-0.250) 


8 x 10~ 4 


Overall exp. 


9.88 (9.70-10.04) 


9.78 (9.72-9.84) 


0.508 


Female exp. 


9.14 (8.86-9.40) 


8.28 (8.19-8.39) 


8 X 10~ 7 


Male exp. 


9.32 (9.09-9.52) 


9.48 (9.41-9.55) 


0.034 


CDS length 


541 (503-575) 


498 (488-509) 


0.004 



Note. — For each variable, we report the mean with 95% CIs in parentheses. We examined four regions: X, A, oX, and oA. P, adjusted P value of the Mann-Whitney U test for 
differences between X and A (italicized values show significant results P < 0.05); tt 4 corrected for the X are the raw values multiplied by 4/3; Rec, effective recombination rate 
(cM per MB times 2/3 for X and 1/2 for A); GC3, GC content of third codon positions; GQ GG content of short introns (<80bp); Exp.: gene expression as measured by log 2 
(mean RPKM + 1); CDS length, coding sequence length in number of amino acids. 



of values within each category (table 2). There were signifi- 
cantly higher levels of Fop, GC 3 and CC, for X versus A in the 
low and intermediate recombination regions, but not for the 
high recombination regions (table 2), with the exception of 
CC3, which was significantly higher for the X in all regions. 
The mean X/A ratio for Fop was significantly above one for 
the low and intermediate recombination regions (95% CI: 
1.06-1.09 and 1.05-1.09, respectively), but not for the high 
recombination region (CI: 0.998-1.05). The top left panel of 
figure 1 shows that Fop for the X is consistently higher than 
for A for the same effective recombination rate over much of 
the range of recombination rates. 

A comparison of the three regions displays the previously 
observed tendency for Fop and the GC content of X chromo- 
somal genes to decline substantially with the recombination 
rate (Singh et al. 2005b); in contrast, this effect is absent from 
the autosomes (table 2). The effect of recombination was 
confirmed by examining the partial correlations between 
Fop and recombination rate for the full data set and for all 
the genes in the overlap regions, holding expression level, K a 
K 4 , GC, and coding sequence length constant (table 3 and 



fig. 1); the Spearman rank partial correlation coefficients (r s ) 
are -0.077 (P = 0.019) and -0.315 (P = 10~ 10 ) for the whole 
X and overlap region of the X, respectively, but only -0.009 
(P = 0.57) and -0.022 (P = 0.13) for the autosomes. The rela- 
tionship between recombination and GC, shows a similar 
pattern, with highly significant r s values of -0.303 
(P < 10~ 16 ) and -0.500 (P < 10~ 16 ) for the whole X and 
the overlap region, respectively, but nonsignificant (P> 0.1) 
values for the autosomes. In addition, Fop and GC, have sig- 
nificantly positive partial correlations for both the X genes 
(whole X r s = 0.260, P < 10~ 16 ; overlap X r s = 0.150, P = 0.003) 
and A genes (whole A r s - 0.273, P<10~ 16 ; overlap A 
r s = 0.269, P< 10~ 16 ). 

Diversity Values for Sites on X and A 
In the full data set, the mean nucleotide site diversities at 
4-fold degenerate sites (jt 4 ) were similar on X and A, at 
0.0152 and 0.0159, respectively (P = 0.67; table 1); if the X 
diversity values are multiplied by 4/3, their mean is signifi- 
cantly higher than that for the autosomes (47r 4X /3 = 0.0203, 
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Table 2. Variables Analyzed for the Three Subsets of the Overlap Regions with Respect to Recombination Rate: Low (1-1.4 cM/Mb), Intermediate 



(1.40-1.75 cM/Mb), and Hii 


gh (1.75-2.1 cM/Mb). 








Low oX 


Low oA 


P 


N 


167 


1,089 




Rec 


1.21 (1.20-1.23) 


1.24 (1.23-1.24) 


0.133 


Fop 


0.596 (0.584-0.608) 


0.508 (0.502-0.513) 


<io-' 6 


CC 3 


0.741 (0.731-0.753) 


0.629 (0.623-0.635) 


<W~ U 


cc, 


0.477 (0.459-0.494) 


0.345 (0.338-0.353) 


<w~ u 


to 


0.00118 (0.00092-0.00139) 


0.00173 (0.00161-0.00185) 


<w~ u 


n 4 


0.0103 (0.0092-0.0114) 


0.0153 (0.0147-0.0159) 


3 x 10" 9 


n 4 corrected 


0.0137 (0.0123-0.0151) 


0.0153 (0.0147-0.0159) 


0.115 


Ko 


0.039 (0.033-0.045) 


0.039 (0.037-0.042) 


0.504 


K 4 


0.226 (0.215-0.237) 


0.249 (0.244-0.254) 


0.001 


Overall exp. 


10.19 (9.90-10.50) 


9.71 (9.57-9.86) 


0.030 


Female exp. 


9.70 (9.22-10.17) 


7.94 (7.70-8.19) 


3 x 10~ s 


Male exp. 


9.73 (9.40-10.04) 


9.22 (9.06-9.39) 


0.184 


CDS length 


548 (463-621) 


504 (477-532) 


0.270 




Intermediate oX 


Intermediate oA 


P 


N 


193 


3,195 




Rec 


1.58 (1.56-1.59) 


1.59 (1.59-1.59) 


0.162 


Fop 


0.564 (0.554-0.575) 


0.527 (0.523-0.530) 


8 X 10" 9 


CC 3 


0.708 (0.698-0.719) 


0.652 (0.648-0.655) 


<W~ U 


CC, 


0.431 (0.415-0.444) 


0.357 (0.352-0.361) 


3 x 10~ M 


^0 


0.00116 (0.00095-0.00134) 


0.00172 (0.00165-0.00179) 


1 x 10" 5 




0.0127 (0.0115-0.0137) 


0.0179 (0.0175-0.0183) 


3 x 10-'° 


n 4 corrected 


0.0169 (0.0154-0.0184) 


0.0179 (0.0175-0.0183) 


0.298 




0.041 (0.035-0.046) 


0.037 (0.036-0.038) 


0.097 




0.245 (0.234-0.258) 


0.244 (0.241-0.247) 


0.853 


Overall exp. 


9.62 (9.33-9.91) 


9.77 (9.68-9.85) 


0.399 


Female exp. 


8.83 (8.38-9.28) 


8.33 (8.18-8.46) 


0.188 


Male exp. 


8.98 (8.60-9.39) 


9.49 (9.39-9.59) 


<W~ U 


CDS length 


503 (454-549) 


500 (485-514) 


0.130 




High oX 


High oA 


P 


N 


209 


1,751 




Rec 


1.95 (1.94-1.97) 


1.88 (1.88-1.89) 


<w-' 6 


Fop 


0.523 (0.509-0.536) 


0.511 (0.507-0.515) 


0.133 


CC 3 


0.653 (0.642-0.665) 


0.633 (0.628-0.637) 


0.015 


CC, 


0.352 (0.335-0.369) 


0.345 (0.341-0.351) 


0.342 


to 


0.00133 (0.00111-0.00155) 


0.00188 (0.00178-0.00198) 


<W~ U 


n 4 


0.0151 (0.0138-0.0162) 


0.0203 (0.0198-0.0208) 


1 x 10" 9 


n 4 corrected 


0.0201 (0.0184-0.0216) 


0.0203 (0.0197-0.0209) 


0.908 


Ko 


0.042 (0.036-0.048) 


0.040 (0.038-0.042) 


0.417 


K 4 


0.240 (0.227-0.252) 


0.254 (0.250-0.258) 


0.0W 


Overall exp. 


9.87 (9.59-10.2) 


9.86 (9.75-9.97) 


0.997 


Female exp. 


8.97 (8.49-9.46) 


8.42 (8.23-8.60) 


0.069 


Male exp. 


9.30 (8.96-9.63) 


9.61 (9.48-9.75) 


0.096 


CDS length 


570 (503-634) 


490 (470-511) 


0.040 



Note. — P, adjusted P value of the Mann-Whitney U test for differences between X and A (italicized values show significant results, P < 0.05). 



7T 4A = 0.0159, P < 10 ; table 1). This indicates that in the full 
data set the mean X diversity is greater than three-quarters of 
the mean A diversity, the relation expected under neutrality 
when there is purely random variation in offspring number 
among both males and females (Wright 1931). Consistent 
with this, the 95% CI for the ratio of mean X diversity to 



mean A diversity does not overlap 3/4 (0.92-0.99). 
However, within the overlap region as a whole, we observed 
a significantly lower mean ir 4 for X than A (7r 4X = 0.0129 vs. 
ir 4A = 0.0181, P < 10~ 1S ; table 1), and the X and A values did 
not differ significantly after multiplying the X values by 4/3 
(47T 4 x/3 = 0.0171 vs. tt 4A = 0.0181, P = 0.061; table 1). The 95% 
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Fig. 1. Pairwise relationships between several genomic variables. The variables considered are CUB (Fop), effective recombination rate (Rec), CDS length, 
overall gene expression, and GC content in short introns (GC,). The relationships between these variables are investigated in four different data sets: oA, 
autosomal genes in the overlap region; oX, X-linked genes in the overlap region; A, autosomal genes in the full data set which spans the full range of 
effective recombination rates; and X, X-linked genes in the full data set. We plot the Loess regression lines for each data set and pairwise comparison. 
We show the Spearman's rank correlation coefficients and their significance (***P < 0.001; **P < 0.01; *P < 0.05). 



Table 3. Relationships between Pairs of Variables Affecting CUB. 



Pair of Variables 






Region 






Correlates 


X 


A 


oX 


oA 


Fop- 


-Rec 


-0.077 (0.0T9) 


-0.009 (0.568) 


-0.315 (T x 10"'°) 


-0.022 (0.127) 


Exp., K a 


K4, GC,, CDS length 






(-0.140/-0.015) 


(-0.037/0.017) 


(-0.411/-0.222) 


(-0.052/0.012) 






Rec- 


'GC, 


-0.303 (<J0"' 6 ) 


-0.027 (0.120) 


-0.500 (<10~ ,s ) 


-0.026 (0.168) 


None 








(-0.362/-0.247) 


(-0.053/-0.002) 


(-0.582/-0.427) 


(-0.055/0.005) 








0.260 (<10~' 6 ) 


0.273 (<T0~' 6 ) 


0.150 (0.003) 


0.269 (<T0"' 6 ) 


Rec, Ko, 


K^, Exp., CDS length 






(0.200/0.322) 


(0.247/0.298) 


(0.044/0.244) 


(0.241/0.299) 






Fop— CDS length 


-0.273 (<J0" ,S ) 


-0.171 (<10"' 6 ) 


-0.269 (3 x TO -8 ) 


-0.164 (<T0" ,S ) 


Rec, Ko, 


K4, Exp., GC, 






(-0.337/-0.217) 


(-0.198/-0.144) 


(-0.369/-0.175) 


(-0.199/-0.133) 






Fop- 


-Exp. 


0.242 (5 X TO"' 5 ) 


0.310 (<T0"' S ) 


0.235 (2 x TO" 6 ) 


0.298 (<T0~' S ) 


Rec, Ko, 


K v GC„ CDS length 






(0.180/0.303) 


(0.284/0.337) 


(0.143/0.340) 


(0.266/0.325) 






Exp/ 




0.013 (0.68) 


0.007 (0.59) 


0.032 (0.53) 


0.015 (0.34) 


Rec, Ko, 


K<, Exp., CDS length. Fop 






(-0.050/0.077) 


(-0.022/0.034) 


(-0.072/0.126) 


(-0.019/0.048) 







Note. — Correlations among CUB (Fop), effective recombination rate (Rec), gene expression (Exp.), divergence levels (K 0 and K 4 ), and CC content in introns (GC,). The covariates 
whose effects were controlled for are shown in the last column. We examined four regions: X, A, oX, and oA. Spearman's rank partial correlation coefficients and their 
significance levels (italicized values show significant results, P<0.05) are displayed in brackets, 95% CIs for the correlations are shown below in parentheses. 
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CIs of the ratio 47r 4X /37T 4A for the three subdivisions of the 
overlap region are [0.80, 0.99], [0.86, 1.04], and [0.91, 1.08], 
respectively, implying that the X/A diversity ratios for these 
regions do not differ significantly from three-quarters; if any- 
thing, they are slightly lower. In accordance with the results of 
earlier studies of the relation between recombination rate and 
silent site diversity (Charlesworth 2012a), if 7T 4 is plotted 
against the effective recombination rate, it is seen to be high- 
est for the high recombination regions for both X and A, and 
lowest for the low recombination regions; 47T 4X /3 is similar to 
7T 4A for the same effective recombination rate over most of 
the range of recombination rates (fig. 2). Overall, these results 
agree with a previous analysis of a much smaller data set 
(Vicoso and Charlesworth 2009a). 

In contrast to the behavior of table 1 shows that the 
diversities at 0-fold sites (ir 0 ) are much lower for the whole X 
chromosome than for the whole autosomes (jt ox = 0.00130 
vs. 7T 0A = 0.00162, P = 3 x 10~ 10 , 7r 0 x/7r OA = 0.80; table 1 ), with 
a similar contrast in the overlap region (7T 0X = 0.00123 vs. 
7T 0A = 0.00177, P< 10~ 16 ; n 0X ln 0/K - 0.70; table 1). A similar 
pattern is evident for the subdivisions of the overlap region, 
and 7r 0 is only slightly affected by the recombination rate. 
These results are consistent with purifying selection against 
mutations that change the amino acid sequence of proteins, 
and with stronger purifying selection against X mutations 



0.04 - 



0.03 - 




0.0 0.5 1.0 1.5 2.0 2.5 
Recombination (cM/Mb) 

Fig. 2. Effective recombination rate versus 4-fold synonymous diversity 
(7r 4 ) for the autosomes and 4-fold synonymous diversity multiplied by 
4/3 (ti 4 corrected) for the X chromosome. Bold lines represent Loess 
regression lines, in green for the autosomal genes and in red for the X 
chromosome genes. Dashed lines represent the CIs for the lines. The two 
vertical lines indicate the lower and upper ends of the overlap region. 



compared with A mutations, possibly reflecting the effect 
of hemizygosity of the X in males in increasing the effective- 
ness of purifying selection (Vicoso and Charlesworth 2006). 

Indeed, the X/A ratios for jt 0 are not far from the value of 
three-quarters expected for deleterious mutations at 
mutation-selection equilibrium when there is semidomin- 
ance and equal strengths of selection on X and A in both 
sexes. However, when there is selection only on females for 
X-linked genes, and selection on both sexes for autosomal 
genes, regardless of the degree of dominance, the expected 
X/A ratio for jt 0 is 1.5 under mutation-selection equilibrium. 
Interestingly, under a second special case with selection only 
on females for X-linked genes, but selection on only one sex 
for autosomal genes, the expected X/A ratio would be again 
three-quarters (supplementary material 2, Supplementary 
Material online). Therefore, despite the evidence that the X 
chromosome of Drosophila is enriched for genes with 
female-biased expression relative to the autosomes (e.g., 
Sturgill et al. 2007; Meisel et al. 2012), and deficient in genes 
with male-biased expression, female-specific selection on 
X-linked genes cannot in itself account for the observed 
X/A ratio for tx 0 , unless there is highly sex-specific selection 
on autosomal genes as well. 

In contrast, there is no significant difference between the X 
and A with respect to K 0 for the whole chromosome com- 
parisons (K ox = 0.040 vs. K 0A = 0.038, P = 0.07; table 1), and 
K 0 is slightly higher for the X than A in the overlap region 
(Kox = 0.041 vs. K 0A = 0.038, P = 0.034; table 1 ); K 4 for X is sig- 
nificantly lower than for A in both cases (whole region: 
K 4X = 0.240 vs. K 4A = 0.248, P = 6x10~ 5 ; overlap region: 
K 4X = 0.238 vs. K 4X = 0.248, P = 8x10~ 4 ; table 1). Since 
theory suggests that the rate of fixation of deleterious muta- 
tions for the X should be the same as, or slower than, for the 
autosomes in Drosophila (Mank et al. 2010), the higher K 0 for 
the X may reflect the substantial contribution of adaptive 
evolution to nonsynonymous divergence in Drosophila 
(Sella et al. 2009), which could partially obscure the contribu- 
tion from the fixation of slightly deleterious mutations. The 
result for K 4 , which has also been seen in other contexts 
(Vicoso et al. 2008; Haddrill et al. 2010), probably reflects 
the higher intensity of selection for codon usage on the 
X versus the A (see Discussion). 

Estimates of Demography and Selection on CUB 

We analyzed synonymous polymorphisms in the overlap 

region using the model of Haddrill et al. (2011) to detect 



Table 4. Estimates of selection, mutation, and demographic parameters for the overlap region. 

Model Parameter Estimates In L 

Yx Ya 0 K i. g T 

L 0 1.70 1.53 0.0045 3.91 0.79 — — -2,366,568.26 

L, 1.53 1.36 0.0042 3.33 0.75 4.00 0.02 -2,365,196.24 

Li (Yx = ^Ya) — 1-50 0.0012 4.31 1.11 5.57 2.46 -2,365,654.57 

Li (Yx = Ya) 1-39 — 0.0043 3.37 0.67 5.11 0.01 -2,366,051.67 

Note. — Ka-^^a an d Yx = 4AN f s x , where N e and XN e are the effective population sizes for autosomal and X-linked loci, respectively; s A and s x are the corresponding 
heterozygous selection coefficients. 
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differences in selection on codon usage and effective 
population sizes between X and A (see Material and 
Methods). ML analyses suggest that an model with 
recent population expansion fits the data significantly 
better than the L 0 model with constant population size 
{x- 2,744; P < 10~ 16 ; table 4). In agreement with the results 
regarding jt 4 described earlier, the ML estimate of k is 0.75 
under L v A model that assumed equal selection intensities on 
codon usage for the X and A (i.e., s x = s A ; second to last line of 
table 4) fitted significantly less well than the more general 
model, implying that the selection coefficients for preferred 
versus unpreferred codons are larger on the X than A 
(/ 2 = 916.7; P<1(T 16 ). Finally, we found that the full L y 
model explains the data much better than a reduced 
model with Yx = Ya 0 ast line of table 4; x =1.711; 
P < 10~ 16 ), suggesting a higher intensity of selection for 
codon usage on the X chromosome. 

As a further test for selection, we used the fact that, on the 
null hypothesis of neutrality, the site-frequency spectrum 
when 6 is small should be symmetrical about 0.5 regardless 
of the degree of mutational bias (e.g., Charlesworth and 
Charlesworth 2010, p. 238); this is true even in the face of 
changes in population size (see Zeng and Charlesworth 
2010b, Appendix). This procedure thus provides a fairly 
robust test for selection. Figure 3 compares the frequency 
spectra for preferred versus unpreferred variants at 
polymorphic synonymous sites in the overlap region. It can 
be seen that X-linked unpreferred variants tend to segregate 
at lower frequencies than their autosomal counterparts 
(30.2% vs. 34.8%), and a one-tailed Mann-Whitney U test 
shows that the difference is statistically highly significant 
(P < 10~ 15 ). 



Discussion 

Diversity Values on the X Chromosome and 
Autosomes 

African populations are thought to be much closer to the 
ancestral state for D. melanogaster than the European and 
North American populations that have been much more 
intensively studied, where silent site diversity on the X is 
much smaller than for the autosomes (Haddrill et al. 2005; 
Hutter et al. 2007; Pool and Nielsen 2007, 2008). Our results 
agree with previous findings that overall silent nucleotide 
site diversity on the X in African populations is similar in 
magnitude to that for the autosomes (Andolfatto 2001; 
Glinka et al. 2003; Hutter et al. 2007; Singh et al. 2007). But 
Vicoso and Charlesworth (2009a) found that the ratio of 
mean diversity values for X-linked and autosomal loci with 
similar effective recombination rates is close to the value of 
three-quarters expected with purely random variation in 
offspring number in males and females (Wright 1931). 
Our analyses confirm this conclusion, using a much larger 
data set. 

In contrast, in D. pseudoobscura and D. miranda, the ratio 
of X to A synonymous diversities does not differ significantly 
from three-quarters (Haddrill et al. 2010, 201 1). The difference 
in X/A diversity ratios between East African D. melanogaster 
and the other two species is consistent with the lower effect- 
ive recombination rate per basepair in D. melanogaster com- 
pared with the other two species, which increases the ability 
of hitchhiking effects such as background selection to cause 
differences between them (Charlesworth 2012b). The results 
described here are thus consistent with the hypothesis 
that hitchhiking effects are responsible for the elevated 
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overall effective population size experienced by genes on the 
X chromosome in East African populations of D. melanoga- 
ster, relative to that predicted by the standard neutral model 
(Vicoso and Charlesworth 2009a). 

The Causes of the Differences in CUB and CC 
Content between the X Chromosome and the 
Autosomes 

Our analyses of the D. melanogaster genome sequences sug- 
gest that CUB (measured by Fop and y), and the CC content 
at both third coding positions (CC 3 ) and putatively neutral 
short introns (GC,), appear to be higher overall for the X than 
for the autosomes (table 1), as has been reported previously 
(Singh et al. 2005a, 2005b, 2008). The same can be seen in 
overlap regions with low and intermediate recombination 
rates, although this is not true for CUB and GC, in the 
high recombination overlap region (table 2). We now con- 
sider the evidence concerning the possible causes of these 
patterns. 

Hill-Robertson Effects 

These results regarding CUB and GC contents contrast with 
the findings discussed earlier for synonymous diversity in East 
African populations of D. melanogaster, which suggest that 
the mean effective population size for the X (N eX ) is about 
three-quarters of that for the autosomes (N eA ) for loci in the 
overlap regions (tables 1, 2, and 4; fig. 2), but that there are 
approximately equal chromosome-wide values of N eX and 
N eA (table 1). If the X versus A differences in CUB were 
caused solely by differences in N e due to HRI, we would not 
expect to see stronger selection on CUB for X versus A in the 
overlap regions, because with a k 3/4, we expect y x * Ya 
on the assumption of semidominance and equal selection 
coefficients in males and females (Vicoso and Charlesworth 
2009b), similar considerations apply to GC content, as dis- 
cussed in the following section. Furthermore, in D. pseudoobs- 
cura and D. miranda, CUB is also higher for X than A, and 
appears to have increased on the XR chromosome arm since 
its origin from an autosome (Singh et al. 2008; Vicoso et al. 
2008; Haddrill et al. 2011), despite the fact that these species 
have a ratio of N eX to N eA close to 3/4 as discussed earlier. 
These results suggest very strongly that differences in the 
intensity of Hill-Robertson effects are not primarily respon- 
sible for the differences in CUB and base composition be- 
tween X and A. 

Biased Gene Conversion 

Another factor that may influence CUB and GC content is 
biased gene conversion in favor of GC nucleotides (gBGC) — 
the production of a higher frequency of GC versus AT alleles 
in gametes heterozygous for GC/AT (Marais 2003). This af- 
fects CUB in a way similar to selection for preferred codons, 
because 21/22 preferred codons in D. melanogaster end in C 
or C (Zeng 2010). As there is no meiotic exchange of any kind 
between homologs in male Drosophila (Ashburner et al. 
2005), gBGC differentially affects X and A, because X chromo- 
somes spend 2/3 of their time in females as opposed to the 



1/2 spent by the autosomes; it also behaves like weak selec- 
tion on a semidominant allele (Gutz and Leslie 1976; Nagylaki 
1983a, 1983b), and so its strength should be affected by Hill- 
Robertson effects in a similar way to selection on synonymous 
sites, as discussed earlier. 

The change per generation in the frequency q of a GC 
allele, caused by gBGC at a site segregating for GC versus 
AT, can be written as Aq = co'q^ - q), where co' (the rate 
of biased gene conversion) is equivalent to a selection coeffi- 
cient. The parameter co' takes into account both the fre- 
quency of gene conversion events during meiosis and the 
extent to which these are biased in favor of GC 
(Charlesworth and Charlesworth 2010, p. 528-529). Because 
the X chromosome spends two-thirds of its time in females, 
where it is exposed to the possibility of gene conversion, 
the net rate of gBGC for an X-linked site (ty' x ) is two-thirds 
of the rate in females (tt>o<). Similarly, the corresponding 
selection coefficient for an autosomal site (co' A ) is co fA /2, 
where co fA is the autosomal rate of gBGC in females. Thus, 
«'x/<y'A = 4«W3<y fA . 

The equilibrium value of the GC content of a stretch of 
sequence under mutation, gBGC and drift is determined 
jointly by N e a)' and the level of mutational bias in favor of 
GC > AT versus AT > GC mutations (Bulmer 1991; 
Charlesworth and Charlesworth 2010, p. 275, 529). If I = 
Nex/NeA * 3/4 in the overlap region, as suggested by the 
results on diversity discussed earlier, then N eX ft/ x /N eA co' A = 
ft>fx/ft>f A , that is, it is equal to the ratio of the rate of female 
BGC on the X to that for the autosomes. It follows that, if the 
level of mutational bias is similar for the two chromosomes, 
the relative equilibrium GC contents of X and A for the over- 
lap region should increase with ftWcWf/s,; they are equal when 
a>fx/a>fA = 1. A recent study has shown that the rates of initi- 
ation of gene conversion events in female meiosis in D. mel- 
anogaster seem to be similar for X and A, and are relatively 
uniform across chromosomes (Comeron et al. 2012), except 
for the low recombination regions that have been excluded 
from this study. Furthermore, these authors did not find a 
positive correlation between GC content and gene conver- 
sion rate as postulated by the gBGC model (Marais 2003). It 
thus seems unlikely that cof^/co^ exceeds one for these genes, 
unless the extent of GC bias per conversion event is different 
for X and A. Although this possibility cannot be definitively 
excluded, it seems implausible that gBGC alone could ac- 
count for the differences in base composition or Fop between 
X and A in the low- and intermediate-recombination fre- 
quency overlap regions. 

Different Selection Pressures on X Genes 
Versus A Genes 

The higher CUB and GC content of the X chromosome might 
be due to stronger selection for preferred codons and/or GC 
versus AT on X genes compared with A genes. This possibility 
is supported by our analysis of polymorphism data for the 
overlap regions in D. melanogaster (table 4 and fig. 3), con- 
sistent with results on D. pseudoobscura and D. miranda 
(Haddrill et al. 2011). With N eX = 3N eA /4, selection can be 
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stronger on the X (as measured by y x and y A ) because hemi- 
zygosity in males leads to higher sex-averaged selection coef- 
ficients for X-linked loci, which in turn enhances the efficacy 
of natural selection on CUB or CC content relative to the 
autosomes (McVean and Charlesworth 1999; Singh et al. 
2005a; Vicoso and Charlesworth 2009b). Thus, the relative 
Fop or CC contents of the X versus A may depend on the 
dominance coefficient (h) with respect to the fitness effects 
of unpreferred mutations. 

To investigate whether dominance could be the cause 
of the higher level of CUB observed in this study, we can 
compare the ratio of mean values of Fop for X versus auto- 
somes (Fop x /Fop A ) to the theoretical predictions of McVean 
and Charlesworth (1999), which assumed that selection co- 
efficients were the same in both sexes. These show that a 
Fop x /Fop A value of approximately 1.002 is expected when 
h = 0, the most favorable case for stronger selection on the 
X (supplementary material 2, Supplementary Material 
online). As the lowest value for any of the CIs calculated for 
Fop x /Fop A in this study is above 1.002, except for the high 
recombination overlap region (where it is 0.998), it is unlikely 
that this effect alone can cause the higher CUB and GC con- 
tent on the X, in agreement with the conclusions of Singh 
et al. (2005a). The intuitive reason for this is that the equilib- 
rium level of CUB is controlled by the ratio of the fixation 
probability of mutations from unpreferred to preferred 
codons to that for mutations from preferred to unpreferred 
codons (Bulmer 1991, McVean and Charlesworth 1999). 
When N eX = 3N eA /4, recessivity for the fitness effects of unpre- 
ferred mutations (h < 0.5) reduces their probability of fixation 
on the X chromosome relative to the autosomes; it also re- 
duces the probability of fixation of mutations from unpre- 
ferred to preferred codons on the X chromosome relative to 
the autosomes (Vicoso and Charlesworth 2009b). The two 
effects almost exactly cancel out. 

We have also investigated the possible effects of 
female-specific selection when N eX = 3N eA /4 by extending 
the approach of McVean and Charlesworth (1999) for calcu- 
lating the equilibrium frequencies of preferred codons in the 
genome under mutation, selection and drift (supplementary 
material 2, Supplementary Material online). For the same se- 
lection coefficient for X and A, the predicted equilibrium 
values of Fop x /Fop A with selection purely on females for 
X-linked genes, but on both sexes for autosomal loci, are 
always less than 1 and greater than about 0.6 for the 
y values with highest likelihood shown in table 4, regardless 
of the value ofh, as might be expected in view of the fact that 
there is less overall selection on the X-linked genes; the exact 
values depend on h and the extent of mutational bias. If there 
is female-specific selection on the X, and either mode of 
sex-specific selection on the autosomes, Fop x /Fop A is approxi- 
mately 1, regardless of h and the level of mutational bias, 
which is in conflict with the observations. Dominance alone 
cannot, therefore, explain the observed pattern of higher 
codon usage on the X. 

It is also worth noting that the X/A ratio of equilibrium 
synonymous diversity levels under selection for codon usage 
with semidominance and equal selection in both sexes is 



expected to be approximately 0.75, as is observed for the 
overlap region (table 1), whereas it is reduced to around 
0.70 with h = 0.2 (McVean and Charlesworth 1999). 
However, with female-specific selection on the X and 
sex-specific selection of either type on the autosomes, appli- 
cation of the method of McVean and Charlesworth (1999) 
shows that the X/A ratio of synonymous diversities is 0.75, 
regardless of h and the level of mutational bias (supplemen- 
tary material 2, Supplementary Material online). With 
female-specific selection on the X and no sex-specific selec- 
tion on the autosomes, the results depend on both h and the 
degree of mutational bias. This suggests that selection on CUB 
either involves semidominance without sex-specific selection, 
or highly sex-specific selection for both X and A genes. 

Overall, these results imply that selection coefficients 
acting on homozygous or hemizygous variants affecting Fop 
or CC content must be stronger on the X than the autosomes 
(see also Zeng and Charlesworth 2010a). In agreement with 
this conclusion, the scaled selection coefficient for the 
best-fitting model of semidominant selection (/_,) was esti- 
mated from the polymorphism data to be higher on the 
X (y x = 1.53) than the autosomes (y A - 1.36) for the overlap 
region (table 4). For a selection model with semidominance, 
when X = 0.75, as suggested by our results (table 4), the cor- 
responding ratio of selection coefficients for genes on X versus 
A is equal to y x ly A (Vicoso and Charlesworth 2009b), that is, 
1.53/1.36 = 1.12. This stronger selection at X linked loci for the 
overlap region of D. melanogaster is consistent with the pat- 
tern inferred in D. pseudoobscura and D. miranda (Haddrill 
et al. 2011). 

The generally lower K 4 values for X versus A (tables 1 and 
2) lend further support to the suggestion of stronger net selec- 
tion on codon usage on the X, whatever its source. Equations 
(6.10) and (6.11) of Charlesworth and Charlesworth (2010, 
p. 275) can be used to assess the approximate expected 
ratio of K 4 for X to that for A, on the assumption of 
drift-mutation-selection equilibrium. The predicted ratio is 
given by 

K 4X ^ Fop x y x [exp()/ A ) - 1] 
«4A Fop A y A [exp(y x ) - 1] 

where subscripts X and A represent values for the X chromo- 
some and autosomes, respectively. Using the estimates from 
tables 1 and 4, the predicted value of K 4X /K 4A is 0.968 for the 
overlap region, which is not significantly different from the 
observed ratio of 0.960. 

The fact that K 4 for the X chromosome is substantially 
lower than K 4 for the autosomes because of selection on 
CUB, as was also found for D. pseudoobscura (Vicoso et al. 
2008; Haddrill et al. 2010), means that caution must be used in 
interpreting the difference in K 0 /K 4 between X and A in the 
overlap region (0.172 for X and 0.152 for A in table 1) as 
evidence for faster adaptive evolution of nonsynonymous 
mutations on the X; the difference in K 0 is only marginally 
significant, whereas the difference in K 4 is highly significant. 
Estimates of the proportions of nonsynonymous mutations 
fixed by positive selection (a), based on the comparison 
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of the ratio of the numbers of 0-fold and 4-fold polymorph- 
isms to K 0 /K A (McDonald and Kreitman 1991; Fay et al. 2002; 
Smith and Eyre-Walker 2002), will be correspondingly more 
upwardly biased for the X than A. This casts some doubt 
on recent claims for a "faster-X" effect for D. melanogaster 
based on population genomic data (Langley et al. 2012; 
Mackay et al. 2012). 

The good fit of the X/A ratio of K 4 to the predictions of the 
effects of selection on CUB implies that it is unlikely that a 
higher male than female mutation rate explains the lower K 4 
for X than A. It has recently been suggested by Zhou and 
Bachtrog (2012) that the higher K 4 with respect to 
D. pseudoobscura, observed for genes on the nonrecombining 
D. miranda neo-Y chromosome when compared with their 
counterparts on the neo-X chromosome, is due to a higher 
male mutation rate; however, this effect is also consistent 
with a relaxation of selection on CUB caused by the reduced 
effective population size of the neo-Y chromosome. 

The Role of Gene Expression 

Singh et al. (2005a) suggested that a higher level of CUB for X 
genes could have been selected for if dosage compensation of 
the X chromosome in males for the loss of function of its 
Y-linked partner is incomplete. However, this seems unlikely 
in view of the evidence for the high efficiency of the dosage 
compensation system in Drosophila (Lucchesi et al. 2005); 
moreover, the slightly higher level of gene expression in 
males than in females for X-linked genes (table 1) seems in- 
consistent with this possibility. 

However, table 1 shows that the mean level of expression 
of X chromosome genes in female D. melanogaster is some- 
what higher than that of autosomal genes (see also Gupta 
et al. 2006; Sturgill et al. 2007; Zhang and Oliver 2010). 
As higher gene expression levels are associated with stronger 
selection for CUB (Duret and Mouchiroud 1999; Drummond 
and Wilke 2008; Zeng and Charlesworth 2009), this pattern of 
gene expression might account for the higher level of CUB 
and CC 3 on the X, because more weight is given to females 
than to males with respect to selection on the X when there is 
intermediate dominance, as has been already been empha- 
sized several times. At the suggestion of a reviewer, we tested 
this possibility by examining the linear and Loess regressions 
of Fop for X and A separately, on the weighted average of 
adult female and male expression levels (see Material and 
Methods). As can be seen from supplementary material 3, 
Supplementary Material online, for the same expression level 
Fop for the overlap region of the X is consistently higher than 
Fop for the overlap region of A, except for the comparatively 
small number of genes with very high expression levels. This 
falsifies the hypothesis that a difference in expression level 
caused the differences in mean Fop between X and A. The 
cause of the apparent difference between X and A in selection 
intensity on CUB thus remains obscure. 

Mutational Bias Effects and the Recombinational 
Landscape of Drosophila 

In addition, it is hard to explain the higher GC content in 
short introns (GQ on the X versus A, which is found both 



overall and in the low and intermediate recombination re- 
gions (tables 1 and 2), and the negative relationship between 
recombination rate and GC content/CUB on the X but not A. 
We first examine the question of the X/A difference in in- 
tronic GC content. A lower rate of GC > AT mutations rela- 
tive to AT > GC mutations on the X compared with A could 
potentially explain the higher GC content of both coding and 
intronic sequences. The analysis of Zeng and Charlesworth 
(2010a), however, provided no support for a lower GC > AT 
mutational bias for X genes. We have also fitted a model of 
selection on codon usage for the overlap region, similar to 
that used to generate table 4, but allowing potentially differ- 
ent mutational biases for X and A (supplementary material 4, 
Supplementary Material online). If anything the estimated 
mutational bias for X was greater than for A (k x = 4.17 vs. 
k a = 3.23). Thus, mutational bias per se seems to be incap- 
able of explaining the X versus A differences in GC content 
or CUB. 

The negative relationship between recombination rate and 
GC content/CUB on the X but not A (Singh et al. 2005b) also 
remains unexplained. This effect can be seen in the overlap 
regions as well as over the whole X (table 2 and fig. 1). Note, 
however, that regions of the X chromosome that lack crossing 
over, such as the pericentric and telomeric heterochromatin, 
have highly reduced Fop and GC contents, consistent with 
strong Hill-Robertson effects in these regions (Campos et al. 
2012). Singh et al. (2005b) proposed that the recombinational 
landscape in the D. melanogaster euchromatin may have 
changed over a timescale shorter than that required for equili- 
bration of CUB and base composition, converting a previously 
positive correlation between Fop/CC content and local re- 
combination rate on the X into a negative one, and a positive 
correlation on the autosomes into a near-zero one. 

Given the significantly higher values of mean tt 4 for the 
high versus the low recombination overlap regions, for both 
X and A (tables 1 and 2), it is clear that the negative relation 
between Fop/GC content and recombination rate for the 
X chromosome, and the lack of such a relation for the auto- 
somes, are inconsistent with the assumption that their cur- 
rent values are at mutation-selection-drift equilibrium under 
the N e values for the different recombination regions sug- 
gested by the diversity data. This supports the proposal of 
Singh et al. (2005b) and is consistent with other evidence that 
the D. melanogaster genome is out of equilibrium (reviewed 
by Zeng and Charlesworth 2010a). Genome-wide surveys 
of variability and divergence, as well as fine-scale genetic 
maps of D. melanogaster and its close relatives, should help 
to shed light on this problem. 

Conclusions 

Our analyses show that 

1) When differences in effective recombination rates be- 
tween X and A in Drosophila, mainly due to the lack of 
crossing over in males, are taken into account, the effect- 
ive population size of the X in the Rwandan population 
of D. melanogaster (as estimated from 4-fold degenerate 
site diversity) is approximately three-quarters of that for 
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the autosomes, the value expected with neutrality and 
random variation in offspring number. 

2) In contrast, the level of CUB remains higher for the X 
than for the A when a similar adjustment for recombin- 
ation rate is made. 

3) This feature of CUB is consistent with estimates from 
polymorphism data that indicate stronger selection on 
variants affecting codon usage on X versus A in regions 
with comparable effective recombination rates. 

4) The stronger selection on CUB on the X means that 
estimates of the rate of adaptive evolution of protein 
sequence evolution based on the McDonald-Kreitman 
test are more upwardly biased for the X than A. 

5) We appear to have ruled out both dominance and the 
higher average level of expression in females of X genes 
compared with A genes as explanations for this stronger 
apparent selection for CUB on the X. 

6) Mutational bias and biased gene conversion are also 
not capable of explaining these patterns. In addition, 
the higher GC content of short introns on X versus A, 
and the negative relation between recombination rate 
and codon usage on the X, remain to be explained. 

Supplementary Material 

Supplementary materials 1-4 are available at Molecular 
Biology and Evolution online (http://www.mbe.oxfordjournals 
■org/). 
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